Goto

Collaborating Authors

 variational information bottleneck



WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent

arXiv.org Artificial Intelligence

LLM-brained web agents offer powerful capabilities for web automation but face a critical cost-performance trade-off. The challenge is amplified by web agents' inherently complex prompts that include goals, action histories, and environmental states, leading to degraded LLM ensemble performance. To address this, we introduce WebRouter, a novel query-specific router trained from an information-theoretic perspective. Our core contribution is a cost-aware Variational Information Bottleneck (ca-VIB) objective, which learns a compressed representation of the input prompt while explicitly penalizing the expected operational cost. Experiments on five real-world websites from the WebVoyager benchmark show that WebRouter reduces operational costs by a striking 87.8\% compared to a GPT-4o baseline, while incurring only a 3.8\% accuracy drop.



Reviews: Relevant sparse codes with variational information bottleneck

Neural Information Processing Systems

I find the paper novel and interesting. To my knowledge the algorithm is original and it adds to the existing tollbox of IB based approaches. The proposed method seems to outperform Gaussian IB on denoising and occlusion/inpaiting tasks on simulated and real data. It also provides new analysis tools for sparse representations in the form of IB information curves. Overall I think this work has many promising applications in machine learning and neuroscience and would be of interest to the NIPS audience.


A Distance Metric Learning Model Based On Variational Information Bottleneck

arXiv.org Artificial Intelligence

In recent years, personalized recommendation technology has flourished and become one of the hot research directions. The matrix factorization model and the metric learning model which proposed successively have been widely studied and applied. The latter uses the Euclidean distance instead of the dot product used by the former to measure the latent space vector. While avoiding the shortcomings of the dot product, the assumption of Euclidean distance is neglected, resulting in limited recommendation quality of the model. In order to solve this problem, this paper combines the Variationl Information Bottleneck with metric learning model for the first time, and proposes a new metric learning model VIB-DML (Variational Information Bottleneck Distance Metric Learning) for rating prediction, which limits the mutual information of the latent space feature vector to improve the robustness of the model and satisfiy the assumption of Euclidean distance by decoupling the latent space feature vector. In this paper, the experimental results are compared with the root mean square error (RMSE) on the three public datasets. The results show that the generalization ability of VIB-DML is excellent. Compared with the general metric learning model MetricF, the prediction error is reduced by 7.29%. Finally, the paper proves the strong robustness of VIB-DML through experiments.


Flexible Variational Information Bottleneck: Achieving Diverse Compression with a Single Training

arXiv.org Artificial Intelligence

Information Bottleneck (IB) is a widely used framework that enables the extraction of information related to a target random variable from a source random variable. In the objective function, IB controls the trade-off between data compression and predictiveness through the Lagrange multiplier $\beta$. Traditionally, to find the trade-off to be learned, IB requires a search for $\beta$ through multiple training cycles, which is computationally expensive. In this study, we introduce Flexible Variational Information Bottleneck (FVIB), an innovative framework for classification task that can obtain optimal models for all values of $\beta$ with single, computationally efficient training. We theoretically demonstrate that across all values of reasonable $\beta$, FVIB can simultaneously maximize an approximation of the objective function for Variational Information Bottleneck (VIB), the conventional IB method. Then we empirically show that FVIB can learn the VIB objective as effectively as VIB. Furthermore, in terms of calibration performance, FVIB outperforms other IB and calibration methods by enabling continuous optimization of $\beta$. Our codes are available at https://github.com/sotakudo/fvib.


Contrastive variational information bottleneck for aspect-based sentiment analysis

arXiv.org Artificial Intelligence

Deep learning techniques have dominated the literature on aspect-based sentiment analysis (ABSA), achieving state-of-the-art performance. However, deep models generally suffer from spurious correlations between input features and output labels, which hurts the robustness and generalization capability by a large margin. In this paper, we propose to reduce spurious correlations for ABSA, via a novel Contrastive Variational Information Bottleneck framework (called CVIB). The proposed CVIB framework is composed of an original network and a self-pruned network, and these two networks are optimized simultaneously via contrastive learning. Concretely, we employ the Variational Information Bottleneck (VIB) principle to learn an informative and compressed network (self-pruned network) from the original network, which discards the superfluous patterns or spurious correlations between input features and prediction labels. Then, self-pruning contrastive learning is devised to pull together semantically similar positive pairs and push away dissimilar pairs, where the representations of the anchor learned by the original and self-pruned networks respectively are regarded as a positive pair while the representations of two different sentences within a mini-batch are treated as a negative pair. To verify the effectiveness of our CVIB method, we conduct extensive experiments on five benchmark ABSA datasets and the experimental results show that our approach achieves better performance than the strong competitors in terms of overall prediction performance, robustness, and generalization. Code and data to reproduce the results in this paper is available at: https://github.com/shesshan/CVIB.


Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

arXiv.org Artificial Intelligence

The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.


Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

arXiv.org Artificial Intelligence

Recent advances in sophisticated synthetic speech generated from text-to-speech (TTS) or voice conversion (VC) systems cause threats to the existing automatic speaker verification (ASV) systems. Since such synthetic speech is generated from diverse algorithms, generalization ability with using limited training data is indispensable for a robust anti-spoofing system. In this work, we propose a transfer learning scheme based on the wav2vec 2.0 pretrained model with variational information bottleneck (VIB) for speech anti-spoofing task. Evaluation on the ASVspoof 2019 logical access (LA) database shows that our method improves the performance of distinguishing unseen spoofed and genuine speech, outperforming current state-of-the-art anti-spoofing systems. Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.


A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

arXiv.org Artificial Intelligence

Attention-based deep learning models, such as Transformers (Vaswani et al., 2017; Devlin et al., 2019), have achieved unprecedented empirical success in a wide range of cognitive tasks, in particular in natural language processing (NLP). On the other hand, deep variational Bayesian approaches to representation learning, such as variational autoencoders (VAEs) (Kingma and Welling, 2014), have also been very influential, especially due to their variational information bottleneck (VIB) (Alemi et al., 2017; Kingma and Welling, 2014) for regularising the induced latent representations. Previous VIB methods only apply to a vector space, and Transformers crucially do not use a single vector as their latent representation, instead using a set of vectors (Lin et al., 2020; Fang et al., 2021; Park and Lee, 2021). This allows the number of vectors in a Transformer embedding to grow with the size of the input, which is essential for embedding natural language text (Bahdanau et al., 2015), where the size of the input can range from a single word to thousands of words. In this paper, we propose a variational information bottleneck regulariser for set-of-vector latent representations, and use it to regularise the induced latent representation of a Transformer encoder-decoder variational autoencoder.